We  present  a  randomized  algorithm  for  the  approximate  nearest  neighbor  problem  in  d- 
dimensional  Euclidean  space.  Given  N  points  {*.,•}  in  M#  the  algorithm  attempts  to  find 
k  nearest  neighbors  for  each  of  Xj,  where  A;  is  a  user-specified  integer  parameter.  The 
algorithm  is  iterative,  and  its  CPU  time  requirements  are  proportional  to  T  ■  N  ■  (d  ■  (log  d)  + 
k  ■  (d  +  log  A;)  •  (log  IV))  +  N  ■  k2  ■  (d  +  log  A;),  with  T  the  number  of  iterations  performed. 
The  memory  requirements  of  the  procedure  are  of  the  order  N  ■  {d  +  k). 

A  byproduct  of  the  scheme  is  a  data  structure,  permitting  a  rapid  search  for  the  k  nearest 
neighbors  among  {xj}  for  an  arbitrary  point  *  £  Wl.  The  cost  of  each  such  query  is 
proportional  to  T ■  (d  •  (log  d)  +  \og(N /k)  ■  k  ■  (d  +  log  A;)) ,  and  the  memory  requirements  for 
the  requisite  data  structure  are  of  the  order  N  ■  (d  +  A;)  +  T  ■  (d  +  N). 

The  algorithm  utilizes  random  rotations  and  a  basic  divide-and-conquer  scheme,  followed 
by  a  local  graph  search.  We  analyze  the  scheme’s  behavior  for  certain  types  of  distributions 
of  { Xj },  and  illustrate  its  performance  via  several  numerical  examples. 
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1  Introduction 


In  this  paper,  we  describe  an  algorithm  for  finding  approximate  nearest  neighbors  (ANN) 
in  d-dimensional  Euclidean  space  for  each  of  N  user-specified  points  {xj}.  For  each  point 
Xj,  the  scheme  produces  a  list  of  k  ’’suspects”,  that  have  high  probability  of  being  the  k 
closest  points  (nearest  neighbors)  in  the  Euclidean  metric.  Those  of  the  ’’suspects”  that  are 
not  among  the  ’’true”  nearest  neighbors,  are  close  to  being  so. 

We  present  several  measures  of  performance  (in  terms  of  statistics  of  the  k  chosen 
suspected  nearest  neighbors) ,  for  different  types  of  randomly  generated  data  sets  consisting 
of  N  points  in  Md.  Unlike  other  ANN  algorithms  that  have  been  recently  proposed  (see 
e.g.  [1]),  the  method  of  this  paper  does  not  use  locality-sensitive  hashing.  Instead  we  use 
a  simple  randomized  divide- and-conquer  approach.  The  basic  algorithm  is  iterated  several 
times,  and  then  followed  by  a  local  graph  search. 

The  performance  of  any  fast  ANN  algorithm  must  deteriorate  as  the  dimension  d  in¬ 
creases.  While  the  running  time  of  our  algorithm  only  grows  as  d-logd,  the  statistics  of  the 
selected  approximate  nearest  neighbors  deteriorate  as  the  dimension  d  increases.  We  provide 
bounds  for  this  deterioration  (both  analytically  and  empirically),  which  occurs  reasonably 
slowly  as  d  increases.  While  the  actual  estimates  are  fairly  complicated,  it  is  reasonable  to 
say  that  in  20  dimensions  the  scheme  performs  extremely  well,  and  the  performance  does 
not  seriously  deteriorate  until  d  is  approximately  60.  At  d  =  100,  the  degradation  of  the 
statistics  displayed  by  the  algorithm  is  quite  noticeable. 

An  outline  of  our  algorithm  is  as  follows: 

1.  Choose  a  random  rotation,  acting  on  M.d,  and  rotate  the  N  given  points. 

2.  Take  the  first  coordinate,  and  divide  the  data  set  into  two  boxes,  where  the  boxes  are 
divided  by  finding  the  median  in  the  first  coordinate. 

3.  On  each  box  from  Step  2,  we  repeat  the  subdivision  on  the  second  coordinate,  obtain¬ 
ing  four  boxes  in  total. 

4.  We  repeat  this  on  coordinates  3,4,  etc.,  until  each  of  the  boxes  has  approximately  k 
points. 

5.  We  do  a  local  search  on  the  tree  of  boxes  to  obtain  approximately  k  ’’suspects”,  for 
each  point  Xj. 

6.  The  above  procedure  is  iterated  T  times,  and  for  each  point  Xj,  we  select  from  the 
T  ■  k  ”  suspects”  the  k  closest  discovered  points  for  Xj . 

7.  Perform  a  local  graph  search  on  the  collections  of  suspects,  obtained  in  Step  6  (we 
call  this  local  graph  search  ’’supercharging”).  Among  k2  ’’candidates”  obtained  from 
the  local  graph  search,  we  select  the  best  k  points  and  declare  these  ’’the  suspected 
approximate  nearest  neighbors” ,  or  ”  suspects” . 

The  data  structure  generated  by  this  algorithm  allows  one  to  find,  for  a  new  data  point  y, 
the  k  suspected  approximate  nearest  neighbors  in  the  original  dataset.  This  search  is  quite 
rapid,  as  we  need  only  follow  the  already  generated  tree  structure  of  the  boxes,  obtained  in 
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the  steps  listed  above.  One  can  easily  see  that  the  depth  of  the  binary  tree,  generated  by 
Steps  1  through  4,  is  log 2(N/k).  This  means  that  we  can  use  the  T  trees  generated,  and 
then  pass  to  Step  7. 

Almost  all  known  techniques  for  solving  ANN  problems  use  tree  structures  (see  e.g.  [1], 
[2],  [3]).  Two  apparently  novel  features  of  our  method  are  the  use  of  fast  random  rotations 
(Step  1),  and  the  local  graph  search  (Step  7),  which  dramatically  increases  the  accuracy 
of  the  scheme.  We  use  the  Fast  Fourier  Transform  to  generate  our  random  rotations,  and 
this  accounts  for  the  factor  of  log  d  that  appears  in  the  running  time.  Our  use  of  random 
rotations  replaces  the  usual  projection  argument  used  in  other  ANN  algorithms,  where  one 
projects  the  data  on  a  random  subspace.  As  far  as  we  know,  the  use  of  fast  rotations 
for  applications  of  this  type  appears  first  in  [3]  (see  [4]  and  the  references  therein  for  a 
brief  history).  The  use  of  random  rotations  (as  in  our  paper)  or  random  projections  (as 
used  elsewhere  in  ANN  algorithms)  takes  advantage  of  the  same  underlying  phenomenon; 
namely  the  Johnson-Lindenstrauss  Lemma.  (The  JL  Lemma  roughly  states  that  projection 
of  N  points  on  a  random  subspace  of  dimension  C{e)  ■  (log  IV)  has  expected  distortion 

1  +  e,  see  e.g.  [5].)  We  have  chosen  to  use  random  rotations  in  place  of  the  usual  random 
projections  generated  by  selecting  random  Gaussian  vectors.  The  fast  random  rotations 
require  0(d  ■  (log d))  operations,  which  is  an  improvement  over  methods  using  random 
projections  (see  [6],  [7]). 

The  N  x  k  lookup  table  arising  in  Step  7  is  the  adjacency  matrix  of  a  graph  whose 
vertices  are  the  points  {xj}.  In  Step  7  we  perform  a  depth  one  search  on  this  graph,  and 
obtain  <  k  +  k2  ”  candidates”  (of  whom  we  select  the  ”  suspects” ) .  This  accounts  for  the 
factor  of  k2  in  the  running  time.  Due  to  degradation  of  the  running  time,  we  have  chosen 
not  to  perform  searches  of  depth  greater  than  one. 

The  algorithm  has  been  tested  on  a  number  of  artificially  generated  point  distributions. 
Results  of  some  of  those  tests  are  presented  below. 

The  paper  is  organized  as  follows.  In  the  first  section,  we  summarize  the  mathematical 
and  numerical  facts  to  be  used  in  subsequent  sections.  In  the  second  section,  we  describe 
the  Randomized  Approximate  Nearest  Neighbors  algorithm  (RANN)  and  analyze  its  cost 
and  performance.  In  the  third  section,  we  illustrate  the  performance  of  the  algorithm  with 
several  numerical  examples. 

2  Mathematical  Preliminaries 

In  this  section,  we  introduce  notation  and  summarize  several  facts  to  be  used  in  the  rest  of 
the  paper. 

2.1  Degree  of  contact 

Suppose  that  d  >  L  >  0  are  positive  integers.  Suppose  further  that 

<T  =  <7i  . .  .(JL,  li  =  ill  Vi,iij  G  {  +  , -}  (1) 

are  two  words  of  symbols  +,  —  of  length  L.  We  define  the  degree  of  contact  Con(cr,  n) 
between  a  and  /x  to  be  the  number  of  positions  at  which  the  corresponding  symbols  are 
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different.  In  other  words 


Con(a,  /a)  =  \{i  :  1  <  i  <  L,  /  m}\ .  (2) 

In  a  mild  abuse  of  notation,  we  say  that  two  disjoint  sets  Ac  and  A (or  their  elements) 
have  degree  of  contact  j  if  Con(cr ,  /x)  =  j.  For  example,  x  and  y  have  degree  of  contact  1 
if  x  E  Aa,  V  E  A ^  and  a,  /i.  differ  at  precisely  one  symbol. 


2.2  Pseudorandom  orthogonal  transformations 

In  this  section,  we  describe  a  fast  method  (presented  in  [6],  [7])  for  the  generation  of  random 
orthogonal  transformations  and  their  application  to  arbitrary  vectors. 

Suppose  that  d,  M\ ,  M2  >  0  are  positive  integers.  We  define  a  pseudorandom  d- 
dimensional  orthogonal  transformation  0  as  a  composition  of  M±  +  M2  +  1  linear  operators 

(M\  \  /  Mi+M2  \ 

nofff  -fm.  n  Qfpf  •  (3) 

)  V  -'I  1  / 

The  linear  operators  Pj'1^  :  with  j  =  1, . . . ,  M1+M2  are  defined  in  the  following 

manner.  We  generate  permutations  ,  ttm1+m2  °f  the  numbers  {1, . . . ,  d},  uniformly 

at  random  and  independent  of  each  other.  Then  for  all  x  E  Md,  we  define  P^x  by  the 
formula 

(pjd)x)  (*)  =  i  =  l,...,d.  (4) 

In  other  words,  P^  permutes  the  coordinates  of  the  vector  x  according  to  iij.  P^  can  be 
represented  by  a  d  x  d  matrix  Pj ,  defined  by  the  formula 


1  l  =  TTj(k), 
0  l  /  TTj(k), 


(5) 


for  k,  l  =  1, . . . ,  d.  Obviously,  the  operators  P^  are  orthogonal. 

The  linear  operators  :  M.d  — >  W1  with  j  =  1, . . . ,  M1  +  M2  are  defined  as  follows.  We 

construct  (d  —  1)  •  (M\  +  M2)  independent  pseudorandom  numbers,  9j(  1), . . . ,  Oj{d—  1)  with 
j  =  1, . . . ,  Mi  +  M2,  uniformly  distributed  in  (0,  27t).  Then  we  define  the  auxiliary  linear 
operator  Qjtk  '■  ® d  — ►  for  k  =  1, . . . ,  d  —  1  to  be  the  planar  rotation  of  the  coordinates  k 
and  k  +  1  by  the  angle  6j{k).  In  other  words,  for  all  x  E  M.d 

k  \  =  (  cos(%(fc))  sin (9j(k))\  (  x{k)  \  ( 

■*’  \k  +  l)  \—sm(9j(k))  cos (Oj(k))J  \x(k  +  1) J  ’ 


and  the  rest  of  the  coordinates  of  Qj^{x)  coincide  with  those  of  x.  We  define  by  the 
formula 


Qj,d—  1  '  Qj,d— 2 


Qj,  i- 


(7) 
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Obviously,  the  operators  Q'jl>  are  orthogonal. 

The  linear  operator  F ^  is  defined  as  follows.  First  suppose  that  d  is  even 

and  that  cfe  =  d/2.  We  define  the  cfo  x  d-2  discrete  Fourier  transform  matrix  T  by  the  formula 


T(k,l ) 


1 

\f<h 


■  exp 


2ni(k  —  1  )(l  —  1) 

d~2 


(8) 


where  k,l  =  1, . . . ,  c?2  and  i  =  1.  The  matrix  T  represents  a  unitary  operator  C^2  — >  Cd2 . 

We  then  define  the  one-to-one  linear  operator  Z  :  Wl  — >  Cd2  by  the  formula 


/  x(l)  +  i  ■  x(2),  \ 

Zx=  x(3)  +  i-x(4), 

\x(2d2  —  1)  +  i  ■  x{2d2)  J 

for  all  x  G  Mrf.  Eventually,  we  define  F ^  by  the  formula 

F(d )  =  Z”1  •  T  ■  Z 


(9) 


(10) 


for  even  d.  If  d  is  odd,  we  define  F^x  for  all  x  G  by  applying  F-d~]>  to  the  first  d  —  1 
coordinates  of  x  and  leaving  its  last  coordinate  unchanged.  Obviously,  the  operators  T,  Z, 
defined  by  (8),  (9),  respectively,  preserve  the  norm  of  any  vector  x  £  M.d.  Therefore,  F^ 
is  a  real  orthogonal  transformation  Wl  — >  Mrf. 

The  cost  of  the  generation  of  a  random  permutation  (see  e.g.  [8])  is  0(d)  operations. 
The  cost  of  the  application  of  each  P-'1-1  to  a  vector  x  G  is  obviously  d  operations  due  to 
(4)' 

The  cost  of  generation  of  d  —  1  uniform  random  variables  is  0(d)  operations.  Also,  the 
cost  of  application  of  each  to  a  vector  x  G  Wd  is  O(d)  operations  due  to  (7). 

Finally,  the  cost  of  the  fast  discrete  Fourier  transform  is  0(d  ■  logd)  operations,  and  the 
cost  of  the  application  of  Fl'd>  to  a  vector  x  e  is  0(d  ■  logd)  operations  due  to  (8),  (9) 
and  (10). 

Thus  the  cost  of  the  generation  of  0  defined  via  (3)  is 


Cost(0)  =  O  (d  ■  (M\  +  M2  +  logd)) .  (11) 

Moreover,  the  cost  of  application  of  0  to  a  vector  x  £  Wl  is  also  given  by  the  formula  (11). 

Remark  1.  It  was  observed  that  if  Mi  and  M2  in  (3)  are  chosen  such  that  M1+M2  ~  logd, 
then  the  distribution  of  0  is  close  to  the  uniform  distribution  on  the  group  0(d,  M)  of 
orthogonal  transformations  from.  Wl  to  Wl. 

Remark  2.  The  use  of  the  Hadamard  matrix  (without  2x2  rotations)  appears  in  a  related 
problem  studied  by  Ailon  and  Liberty  [9]. 


3  Randomized  Approximate  Nearest  Neighbors  algorithm 

In  this  section,  we  describe  the  Nearest  Neighbor  Problem  and  present  a  fast  randomized 
algorithm  for  its  solution. 
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3.1  The  Nearest  Neighbor  Problem 

Suppose  that  d  and  k  <  N  are  positive  integers  and  suppose  that 

B  =  {xi,x2, . . .  ,xN}  C  Rd  (12) 

is  a  collection  of  N  points  in  Wl.  We  are  interested  in  finding  the  k  nearest  neighbors  of 
each  point  x The  distance  between  the  points  is  the  standard  Euclidean  distance. 

For  each  x one  can  compute  in  a  straightforward  manner  the  distances  to  the  rest  of  the 
points  and  thus  find  the  nearest  neighbors.  However,  the  total  cost  of  the  evaluation  of  the 
distances  alone  is  0(d  ■  N2),  which  makes  this  naive  approach  prohibitively  expensive  when 
N  is  large.  We  propose  a  faster  approximate  algorithm  for  the  solution  of  this  problem. 

3.2  Informal  description  of  the  algorithm 
3.2.1  Initial  selection 

The  key  idea  of  our  algorithm  is  the  following  simple  (and  well  known)  observation.  Suppose 
that  for  each  Xj  we  have  found  a  small  subset  V)  of  B  such  that  a  point  inside  V)  is  more  likely 
to  be  among  the  k  nearest  neighbors  of  Xj  than  a  point  outside  V).  Then  it  is  reasonable  to 
look  for  the  nearest  neighbors  of  each  Xi  only  inside  V)  and  not  among  all  the  points.  The 
nearest  neighbors  of  Xi  in  V),  which  can  be  found  by  direct  scanning,  are  referred  to  as  its 
” suspected  approximate  nearest  neighbors”,  or  ’’suspects”,  as  opposed  to  the  true  nearest 
neighbors  {  ) }  • 

Of  course,  many  of  the  k  true  nearest  neighbors  of  Xi  might  not  be  among  its  suspects. 
However,  one  can  re-select  V)  to  obtain  another  list  of  k  suspects  of  x%.  The  initial  guess 
is  improved  by  taking  the  “best”  k  points  out  of  the  two  lists.  This  scheme  is  iterated  to 
successively  improve  the  list  of  suspects  of  each  x,L. 

The  performance  of  the  resulting  iterative  randomized  algorithm  admits  the  following 
crude  analysis.  Suppose  that  the  size  of  Vi  is  a  ■  N,  with  a  <C  1.  Suppose  also  that  the 
number  of  the  true  nearest  neighbors  of  Xi  inside  V)  is  roughly  /3  ■  k,  with  a  <  (3  <  1.  If 
the  choice  of  V)  is  fairly  random,  then  order  0(1/0)  iterations  of  the  algorithm  are  required 
to  find  most  of  the  true  nearest  neighbors  of  each  x%.  Temporarily  neglecting  the  cost  of 
the  construction  of  V),  this  results  in  O  ((a//3)  •  d  ■  N2)  operations  instead  of  O  (d  ■  N2) 
operations  for  the  naive  algorithm.  If  a  <C  f3,  the  improvement  can  be  substantial. 

Our  construction  of  V)’s  is  based  on  geometric  considerations.  First,  we  shift  all  of  the 

points  to  place  their  center  of  mass  at  the  origin  and  apply  a  random  orthogonal  linear 

transformation  on  the  resulting  collection.  Then,  we  choose  a  real  number  y(  1)  such  that 
the  first  coordinate  of  half  of  the  points  in  B  is  less  than  y(l).  In  other  words, 

\{xeB:x(l)<y(l)}\  =  [N/2\.  (13) 

We  divide  all  the  points  into  two  disjoint  sets 

B-  =  {x  £  B  :  x(l)  <  2/(1)}  , 

B+  =  {x  <E  B  :  x(l)  >  2/(1)}  •  (14) 
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Obviously,  the  sizes  of  and  B+  are  the  same  if  N  is  even  and  differ  by  one  if  N  is  odd. 
Next,  we  set  y+(2)  to  be  a  real  number  such  that  the  second  coordinate  of  half  of  the  points 
in  B+  is  less  than  y+( 2),  i.e. 

\{x£B+:x{2)<y+{2)}\  =  [N/4\.  (15) 

We  split  B+  into  two  disjoint  sets  B _| and  B++  by  the  same  principle,  e.g. 

B+ -  =  {x  £  B+:  x{2)  <  y+( 2)}  , 

B++  =  {x£B+:  x(2)  >  y+( 2)}  .  (16) 

We  construct  B _ and  B _ |_  in  a  similar  fashion  by  using  a  real  number  y~{2 )  such  that 

the  second  coordinate  of  half  of  the  points  in  B _  is  less  than  j/_( 2),  i.e. 

\{x£B_  :x{2)  <y-(2)}\  =  [N/4\.  (17) 

Each  of  the  four  boxes  B _ ,B _ \.,B-\ _ ,  B++  contains  either  |_iV / 4J  or  \_N/A\  +  1  points. 

Then  we  repeat  the  subdivision  by  splitting  each  of  the  four  boxes  into  two  by  using  the 
third  coordinate,  and  so  on.  We  proceed  until  we  end  up  with  a  collection  of  2L  boxes 
{Bc r}  with  k  or  k  +  1  points  in  each  box.  Here  the  box  index  a  is  a  word  of  symbols  +,  — 
of  length  L  as  in  (1)  and  L  is  a  positive  integer  such  that  k  ■  2L  <  N  <  k  ■  2L+l .  In  other 
words,  L  is  defined  via  the  inequality 

k  ■  2L  <  N  <  k  ■  2l+1.  (18) 

Obviously,  the  sets  {B^{  constitute  a  complete  binary  tree  of  length  L,  whose  nodes  are 
indexed  by  words  p  of  symbols  +,  —  of  length  up  to  L.  The  set  B  is  at  the  root  of  this  tree, 
the  sets  and  B+  are  at  the  second  level,  and  so  on.  At  the  last  level  of  the  tree,  there 
are  2L  boxes.  The  depth  of  the  tree  equals  to  L  +  1. 

The  notion  of  degree  of  contact  (2)  extends  to  the  collection  {Bo-}  of  boxes.  Suppose 
that  X{  is  in  B(j.  Obviously,  the  higher  degree  of  contact  of  two  boxes  Bc 7  and  B ^  is,  the 
less  likely  a  point  of  B ^  will  be  among  the  k  nearest  neighbors  of  Xj.  Motivated  by  this 
observation,  we  define  the  set  Vi  as 

Vi  =  {x  £  B^,  :  Con{a,fx)  <  l}  .  (19) 

In  other  words,  Vi  is  the  union  of  the  box  B<j  containing  Xi  and  L  boxes  whose  degree  of 
contact  with  Bcr  is  one.  Thus  for  each  i  =  1 , ,N,  the  set  V  contains  about  k  ■  (L  +  1) 
points. 

3.2.2  “Supercharging” 

In  the  previous  subsections,  we  have  described  an  iterative  scheme  for  the  selection  of 
suspects  (suspected  approximate  nearest  neighbors)  for  each  of  the  points  x%  in  B.  Suppose 
now  that  after  T  iterations  of  this  scheme,  the  list  xs(i  ^7 . . .  ,xs^l  k^  of  k  suspects  of  each 
point  Xi  has  been  generated.  This  list  can  be  improved  by  a  procedure  we  call  supercharging. 

The  idea  of  supercharging  is  based  on  the  following  observation.  A  true  nearest  neigh¬ 
bor  of  Xi,  missed  by  the  scheme  described  above,  might  be  among  the  suspects  of  one  of 
x s(iti),  •  •  • ,  ®s(i  This  leads  to  the  following  obvious  procedure. 
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For  each  x j,  we  denote  by  At  the  list  of  suspects  of  all  xsu^, . . . ,  xsu^y  Ai  contains  k1 2 
points,  with  possible  repetitions.  We  compute  the  square  of  the  distances  from  x%  to  each 
point  in  Aj  and  find  the  k  nearest  neighbors  xyyi^y, . . . , x^^aa  of  x%  in  Ajt.  Then  we 

declare  the  (updated)  suspects  of  c c,  to  be  the  best  k  points  out  of  the  two  lists  {*s(ij)}fc_1 

and  {xt{idAi)}kj=1. 

In  other  words,  supercharging  is  a  depth  one  search  on  the  directed  graph,  whose  vertices 
are  the  points  {**}  and  whose  N  x  k  adjacency  matrix  is  the  suspects’  indices  {s(z,  j)}, 
with  i  =  1, . . . ,  N  and  j  =  1, . . . ,  k. 

3.2.3  Overview 

We  conclude  this  section  with  a  list  of  the  principal  steps  of  the  algorithm.  Given  the 
collection  {ccj}^  of  points  in  Md,  we  perform  the  following  operations. 

1.  Subtract  from  each  Xi  the  center  of  mass  of  the  collection. 

2.  Choose  a  random  orthogonal  linear  transformation  0  and  set  Xi  =  Q(xi)  for  all 
i  =  1,. . .  ,N. 

3.  Construct  2L  boxes  {B( T}  as  described  above. 

4.  For  each  Xi  define  the  set  V)  via  (19). 

5.  Update  the  suspects  *5(^1), . . . ,  °f  xi  by  using  its  true  nearest  neighbors  in  V). 

6.  Steps  2-5  are  repeated  T  times. 

7.  For  each  xt,  perform  supercharging. 

3.2.4  Query  for  a  new  point 

Suppose  that  we  are  given  a  new  point  y  €  Mci,  and  we  need  to  find  its  k  nearest  neighbors  in 
B  =  {*1, . . . ,  ccjv}.  In  this  subsection,  we  describe  a  rapid  procedure  to  find  k  approximate 
nearest  neighbors  of  y.  This  procedure  uses  the  following  information,  available  on  the  jth 
iteration  of  the  algorithm,  for  j  =  1 , ,T: 

1.  The  orthogonal  linear  transformation  Q^\  generated  on  the  jth  iteration  of  the  algo¬ 
rithm  . 

2.  The  collection  of  boxes  generated  on  the  jth  iteration  of  the  algorithm. 

To  find  k  approximate  nearest  neighbors  of  the  new  point  y  among  the  points  of  B,  we 
perform  the  following  operations.  First,  we  apply  on  y,  where  0 ■ ’)  is  the  orthogonal 
linear  transformation  of  the  first  iteration  of  the  algorithm.  The  resulting  vector  is  denoted 
by  y^\  in  other  words, 


yA)  =  0(1)  (y)  . 


(20) 


we 


Next,  in  the  collection  of  boxes  generated  on  the  first  iteration  of  the  algorithm, 

find  the  box  -^^(1)  that  has  degree  of  contact  zero  with  y  W.  Note  that  if  y  had  belonged 
to  B  in  the  first  place,  then  on  the  first  iteration  of  the  algorithm  t/1)  would  have  belonged 


to  B 


(i) 

er(i)- 


The  box  has  k  or  k  +  1  points.  Suppose  that  ,  is  is  the  union  of  the  boxes 

having  degree  of  contact  zero  or  one  with  (see  (19)).  Recall  that  ^ias  about 

(L  +  1)  •  k  points,  where  L  is  defined  via  (18). 

We  define  the  set  a  sirnilar  manner,  by  using  the  data  of  the  second  iteration 

of  the  algorithm.  We  apply  the  orthogonal  transformation  of  the  second  iteration  on 
W  to  obtain  y i.e. 


y 


w( 2)  =  ©(2)  (y(1))  =  e(2)  (©(1)  ( v )) ,  (21) 

due  to  (20).  In  the  boxes  |i3^|  of  the  second  iteration,  we  find  the  box  R^|2)’  haying 
degree  of  contact  zero  with  y^2\  Similar  to  V^.1^,  the  set  y^2)  has  about  (L  +  1)  •  k  points. 

We  repeat  this  procedure  to  construct  the  sets  IWA  for  j  =  3,4 , . . . ,  T,  where  T  is  the 
number  of  the  iterations  of  the  algorithm.  Each  contains  roughly  (L  +  1)  •  k  points. 

Finally,  we  define  the  set  A  to  be  the  union  of  all  the  sets  V^-y  in  other  words, 


A  = 


IR< 


U) 

o’O')' 


3= 1 


(22) 


The  set  A  contains  about  T  ■  k  ■  (L  +  1)  points.  The  k  nearest  neighbors  of  y  inside  A  are 
declared  to  be  the  approximate  nearest  neighbors  of  y  inside  B.  We  note  that  to  construct 
A  we  need  to  store  the  corresponding  data  on  each  iteration  of  the  algorithm. 

Once  the  k  suspects  of  y  in  B  have  been  found,  they  can  be  improved  by  performing 
supercharging  on  y  only. 


3.3  Cost  analysis 

In  this  subsection,  we  analyze  the  cost  of  the  algorithm  in  terms  of  number  of  operations. 
Also,  we  analyze  the  memory  requirements  of  the  algorithm.  We  recall  that  cei, . . .  ,xn  is 
a  collection  of  N  points  in  and  N  ~  k  ■  2L. 

1.  Centralizing  the  points  costs  0(N  ■  d). 

2.  Generating  a  pseudorandom  orthogonal  transformation  and  applying  it  to  N  points 
costs  0(N  ■  d  ■  (log d))  (see  (11)). 

3.  Constructing  the  binary  tree  of  boxes  of  depth  L  =  log 2(N/k)  costs  0(N  ■  L ). 

4.  The  suspects  of  each  point  Xi  are  selected  by  computing  the  distances  from  xt  to 
k  ■  (L  +  1)  points  in  V)  (see  (19))  and  finding  k  minimal  distances.  Thus  the  cost  of 
this  step  is  0(N  ■  L  ■  k  ■  [d  +  log  k)). 
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5.  In  supercharging,  k2  points  are  scanned  to  update  the  suspects  of  each  point.  Thus 
the  cost  of  supercharging  is  0{N  ■  k2  ■  (d  +  log  A:)). 

We  conclude  that  the  total  cost  of  the  algorithm  is 

Crann  =  0(T  ■  N  ■  (d-  (log  d)  +  k  ■  (d  +  log  k)  •  (log  N )))  + 

0(N  ■  k2  ■  (d  +  log  k)),  (23) 

where  T  is  the  number  of  iterations.  We  observe  that  for  fixed  dimension  d  and  number  of 
required  nearest  neighbors  k,  the  cost  is  0(T  ■  N  •  log  N),  as  opposed  to  0(N2)  of  the  naive 
approach.  Also,  the  cost  of  supercharging  is  quadratic  in  the  number  of  nearest  neighbors 
for  fixed  dimension  d  and  number  of  points  N,  which  makes  supercharging  expensive  relative 
to  a  single  iteration  of  the  principal  part  of  the  algorithm  even  for  moderate  k. 

Query  for  a  new  point  y  consists  of  performing  all  the  steps  of  the  algorithm  on  one 
point  only.  Consequently,  due  to  (23)  the  total  cost  of  query  for  a  new  point  is 

Cquery  —  0(T  ■  (d-  (log  d)  +  k  ■  (d  +  log  k )  •  (log  N)))  + 

0(k2  •  (d  +  logfc))-  (24) 

To  determine  the  memory  requirements  of  the  algorithm,  we  observe  that  to  store  N  points 
in  we  need  0(N  ■  d)  memory,  to  store  the  indices  of  k  nearest  neighbors  of  each  of  N 
points  we  need  0(N  ■  k )  memory,  and  to  store  a  tree  of  boxes  (with  the  corresponding 
orthogonal  transformation)  we  need  0(N)  memory. 

We  must  distinguish  between  two  cases.  In  the  first  case,  given  N  points  { X{ }  in  Wl, 
we  are  interested  in  finding  k  nearest  neighbors  for  each  Xi  only.  In  other  words,  no  query 
for  a  new  point  will  ever  be  requested.  Then,  the  tree  of  boxes  can  be  destroyed  after  each 
iteration.  Hence,  in  this  case  the  algorithm  uses  0(N  ■  ( d  +  k ))  memory.  In  other  words,  in 
this  case  the  memory  requirements  are  minimal,  in  the  sense  that  most  of  the  memory  is 
spent  on  the  storage  of  input  and  output  of  the  algorithm  only. 

In  the  second  case,  we  know  in  advance  that  queries  for  new  points  will  be  requested. 
Therefore  we  need  to  store  T  trees  of  boxes  and  the  corresponding  orthogonal  transforma¬ 
tions.  Thus,  when  queries  for  new  points  are  allowed,  the  total  memory  requirements  are 
of  the  order 


Mquery  =  0(N-(d  +  k  +  T)).  (25) 

Remark  3.  The  factor  log  A;  in  (23),  (24)  can  be  omitted,  if  the  suspects  of  every  are 
not  required  to  be  sorted  according  to  their  distance  to  xt. 

3.4  Performance  analysis 

In  this  subsection,  we  discuss  the  performance  analysis  of  the  Randomized  Approximate 
Nearest  Neighbor  algorithm  in  the  case  of  standard  Gaussian  distribution  of  the  points. 

To  be  more  specific,  we  consider  the  collection  of  N  independent  standard  Gaussian  d- 
dimensional  random  vectors  X\, . . . ,  Xjy,  where  the  number  of  points  is  given  by  the  formula 

N  =  k  ■  2l  (26) 
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for  some  positive  integer  L  >  0.  We  recall  that  for  each  point  xt,  the  algorithm  approximates 
the  k  true  nearest  neighbors  {xt(i,j)}k-i  °f  xi  by  k  suspects  {xs(i,j)}kj-i-  bi  order  to  analyze 
the  quality  of  this  approximation,  we  introduce  a  number  of  statistical  quantities.  First, 
we  define  the  average  square  of  the  distance  from  Xi  to  its  k  true  nearest  neighbors  by  the 
formula 


j-ytrue 


k 

It 

3= 1 


\Xj  —  X 


I 


(27) 


Next,  we  define  the  average  square  of  the  distance  from  x,:  to  its  k  suspects  by  the  formula 


j-^susp _ 


-kT 

3= 1 


\Xi  —  X 


s(i,j)  I 


(28) 


Finally,  we  define  the  proportion  of  the  true  nearest  neighbors  of  xt  among  its  suspects  by 
the  formula 


1 

Pr°Pi  =  k 


k 

3=1 


Fl  {xs(i,j)} 


k 

3=1 


(29) 


We  analyze  a  single  iteration  of  the  algorithm  (selection  of  suspects  without  supercharging) 
by  evaluating  the  expected  values  of  (27),  (28),  (29)  numerically  (see  Theorems  1,  2,  3 
below).  On  the  other  hand,  the  performance  of  the  algorithm  can  be  studied  empirically, 
by  running  the  algorithm  on  artificially  generated  sets  of  points. 

In  the  rest  of  this  subsection,  we  outline  the  results  of  the  analysis  (see  [10]  for  details). 
We  start  with  introducing  some  notation. 

Suppose  that  d  >  0  is  a  positive  integer,  and  A,  a,  (3  >  0  are  positive  real  numbers.  The 
distributions  x%  X2(^i-^)  and  are  defined  by  their  probability  density  functions, 

given  respectively  via  the  formulas 


fd/ 2—1  .  g— 1/2 

(30) 

4^)  -  2d/2  ■  T(d/2)  ’ 

t.  >  0, 

f  (t)  r— A/2  (V2)J  ,  (f] 

fx2(d,X)\t)  e  4  d+2j4)’ 

j= o  J ' 

t.  >  0, 

(31) 

r(a  +  /3)  1  0-! 

f-XeJOW  -  r{a)m  ■  t  (!  *)  . 

0  <  t  <  1. 

(32) 

Suppose  that  a  >  0  is  a  positive  real  number.  Suppose  also  that  d  >  L  >  0  are  positive 
integers,  and  a  £  M.d  is  a  vector  all  of  whose  coordinates  are  positive.  We  define  the  functions 
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ha  ,hfi,  ha  :  R  — >  C  via  the  formulas 


K  (®) 

=  exp 

ia2x 

1  —  2  ix 

K(x) 

=  exp 

ia2x 

1  —  2  ix 

ha{x ) 

=  exp 

ix 

1  —  2  ix 

•  erfc 


iax\J 2 


s/1  —  2  ix 


y/l  —  2  ix' 


y/l  —  2  ix 

E  «w) 

j=L+ 1 
L 


=  -K  (x), 


(33) 

(34) 


\/l  —  2  ix 


d-L 


nK{j)(X)+I2ha(i)(X)-IlhaU)( 

3= 1 


\ 


X 


7  =  1 


1=1 


(35) 


Also,  we  define  the  function  Ga  ■  (0,  oo)  — >  R  via  the  formula 


Ga(y)  = 


y  r° o 


27 r  •  (L  +  1)  Jo  J- 


e  lxt  ■  ha(x)  dx  dt. 


(36) 


The  following  theorem  provides  an  analytical  formula  for  the  expectation  E  (see 

(27)). 

Theorem  1.  Suppose  that  d,k,N  >  0  are  positive  integers.  Suppose  further  that  Dtf/ae  is 
defined  by  (27).  Then  its  expectation  is  given  by  the  formula 


E  [ D%ue ' 


/•oo  /  i  ^  rl  \ 

f  (fcEyo  AwW-W-oW*)  -fxiWdX.  (37) 

where  the  functions  fx^fB{i,N-i)  are  defined  respectively  by  (30),  (32),  and  is  the 

inverse  of  the  cdf  of  x2(d,  A)  (see  (31),). 

The  following  theorem  provides  an  approximation  to  the  expectation  E  (see 

(28)).  The  error  of  this  approximation  is  verified  via  numerical  experiments  (see  [10]  for 
more  details). 

Theorem  2.  Suppose  that  k  >  0  and  d  >  L  >  0  are  positive  integers.  We  define  the  real 
number  by  the  formula 


7 -\susp  _ 

U appr 


Avg 


k 


E°a 


k  '  L  k  -|-  1 


dX> 


(38) 


Jo  ae-SJ(A)  i=i 

where  the  function  fx i  is  defined  via  (30) ,  the  function  G^1  is  the  inverse  of  Ga  defined 

via  (36),  the  set  Sd(\)  is  the  positive  part  of  the  d-dimensional  hypersphere  of  radius  \/% 
defined  by  the  formula 

Sd( A)  =  |cc  G  McZ  :  ||cc||2  =  A,  x(j)  >0,  1  <  j  <  dj  ,  (39) 
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and  the  average  (Avg)  is  taken  with  respect  to  the  ( d  —  l) -dimensional  area.  Then, 

E  [D™SP]  =  D^ppr  +  O  (/c-1/2)  ,  k  -+  oo.  (40) 

In  other  words,  (40)  holds,  if  we  fix  d,  L  and  let  k  — >  oo. 

The  following  theorem  provides  an  approximation  to  the  expectation  E  [proper]  (see 
(29)).  The  error  of  this  approximation  is  verified  via  numerical  experiments  (see  [10]  for 
more  details). 

Theorem  3.  Suppose  that  k  >  0  and  d  >  L  >  0  are  positive  integers.  We  define  the  real 
number  Pappr  by  the  formula 


P  — 

1  appr  — 


(L  +  1) 


Avg 

aeS+(A) 


Ga 


7^—1 

x2(<4A) 


(2 


—L 


dX > 


(41) 


where  the  function  fx 2  is  defined  via  (30),  the  function  2^^  is  the  inverse  of  the  cdf  of 
X2(d,  A),  defined  via  (31),  the  function  Ga  is  defined  via  (36),  the  set  S)f (A)  is  defined  via 
(39),  and  the  average  (Avg)  is  taken  with  respect  to  the  (d  —  1)- dimensional  area.  Then, 

E  [propjy]  =  Pappr  +  O  (V1/2)  ,  (42) 

where  the  real  random  variable  proper  is  defined  via  (29).  In  other  words,  (42)  holds,  if  we 
fix  d,  L  and  let  k  — >  00. 


4  Numerical  Results 


This  section  has  two  principal  purposes.  First,  we  numerically  evaluate  the  expectations  of 
(27),  (28),  (29)  by  using  Theorems  1,  2,  3  above.  Second,  we  demonstrate  the  performance 
of  the  algorithm  empirically,  by  running  it  on  sets  of  points,  generated  according  to  the 
Gaussian  distribution.  The  choice  of  uniform  or  Hamming  distributions  instead  of  Gaussian 
results  in  very  similar  performance  (see  [10]  for  results  and  details). 

The  algorithm  has  been  implemented  in  FORTRAN  (Lahey  95  Linux  version).  The 
numerical  experiments  have  been  carried  out  on  a  Lenovo  laptop  computer,  with  DualCore 
CPU  2.53  GHz  and  2.9GB  RAM. 

Experiment  1.  In  this  experiment,  we  choose  k  =  30,  and  for  L  =  10, 15,  20,  25  set  N  via 
(26).  Then,  for  different  values  of  d  between  L  and  110,  we  approximate  the  expectations 
E  [H^ue] ,  E  [DSj))sp] ,  E  [prop^]  via  the  numerical  evaluation  of  (37),  (38),  (41),  respectively. 
The  approximations  are  denoted  by  ,  Dffifm,  Pnum ,  respectively,  and  are  accurate  to 
roughly  two  decimal  digits.  Also,  we  compute  the  ratio 


Ratio?™e 


j-\SUSp 

-L-snum 


J~)true 

J-ynum 


(43) 
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The  results  of  Experiment  1  are  shown  in  Figure  1.  In  Figure  1  (left),  we  plot  Ratio^Jg  as  a 
function  of  the  dimensionality  d,  for  each  value  of  L.  In  Figure  1  (right),  we  plot  Pnum  as  a 
function  of  the  dimensionality  d,  for  each  value  of  L  (the  number  of  points  N  is  determined 
via  (26)).  This  quantity  estimates  the  average  proportion  of  the  suspects  among  the  true 
nearest  neighbors  (found  by  one  iteration  of  RANN  without  supercharging). 

Several  observations  can  be  made  from  Experiment  1  (see  Figure  1)  and  from  more 
detailed  experiments  by  the  authors. 

1.  The  ratio  Ratio^g,  defined  via  (43),  slowly  increases  with  the  number  of  points  N, 
for  fixed  values  of  number  k  of  nearest  neighbors  and  dimensionality  d  (see  Figure  1, 
left).  In  other  words,  the  performance  of  RANN  deteriorates  with  the  number  of 
points,  if  terms  of  Ratio|“Jg.  However,  as  number  of  points  is  multiplied  by  2,  this 
ratio  grows  by  a  factor  less  than  1.1  for  most  values  of  d  on  Figure  1  (left). 

2.  The  ratio  Ratio^g  actually  decreases  with  dimensionality  d,  for  fixed  k  and  N  (see 
Figure  1,  left).  For  example,  for  N  ~  106,  this  ratio  decays  from  the  value  of  about 
1.57  at  d  =  20  to  roughly  1.3  at  d  =  110.  This  observation  is  not  surprising,  since  the 
number  of  points  within  the  same  fixed  relative  distance  (e.g.  twice  the  distance  to 
true  nearest  neighbors)  grows  with  the  dimensionality. 

3.  The  proportion  of  true  nearest  neighbors  among  suspects  decays  with  d  for  fixed  k,  N, 
as  expected  (see  Figure  1,  right).  However,  even  for  d  =  40  and  N  ~  106,  RANN 
correctly  finds  about  2.7%  of  true  nearest  neighbors,  on  merely  one  iteration  without 
supercharging.  In  other  words,  after  50  iterations  without  supercharging  RANN  will 
correctly  detect  about 


75%  =  0.75  «  1  -(1  -  0.027)50  (44) 

true  nearest  neighbors,  for  d  =  40  and  N  ~  106  (see  also  Experiment  2  below). 

Experiment  2.  In  this  experiment,  we  run  RANN  with  various  parameters  on  different 
sets  of  points  and  report  on  the  resulting  statistics. 

We  choose  the  dimensionality  d  and  the  number  of  points  N.  Then  we  choose  the  number 
of  nearest  neighbors  k,  the  number  of  iterations  of  the  algorithm  T  >  0,  and  whether  to 
perform  the  supercharging  or  not.  Next,  we  generate  N  i.i.d.  standard  Gaussian  vectors 
*i, . . . ,  x jv  in  Wl. 

We  run  RANN  on  aq, . . . ,  xjy.  For  each  x j,  RANN  finds  its  k  suspects,  xsyxy . . . ,  xsu  ky 
Also,  for  all  *  =  1, ... ,  2000,  we  find  the  list  xtuyy  . . . ,  xty  ^  of  k  true  nearest  neighbors 
of  Xi,  by  direct  scanning.  Then,  we  compute  the  quantities  D(r“e,  JJ*usp^  propj,  defined 
via  (27),  (28),  (29),  respectively,  for  *  =  1, ... ,  2000.  We  define  the  average  by  the 

formula 


1  2000 

DXo  =  2000  E  D'"“  («) 

i= 1 

The  averages  propa;9Q  are  defined  and  computed  by  the  same  token. 
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Generation  of  the  points  and  invocation  of  RANN  are  repeated  20  times,  to  obtain  the 
values  •  •  • ,  20 ),  all  defined  via  (45).  Then,  we  define  the  sample  mean  of 

^aigo  v^a  t^ie  formula 


E smpi  [DtrUe\ 


1 

20 


20 


(46) 


The  sample  means  E smpi  [Daigo\  and  E smpi  [propaZgo]  are  defined  in  a  similar  way.  Finally, 
we  compute  the  ratio 


ratio 


smpi 


E smpi  [ Dtrue }  ' 


(47) 


The  parameters  for  Experiment  2  were  chosen  in  the  following  way.  The  number  of  points 
was  N  =  30  •  212  ~  105.  The  number  of  requested  nearest  neighbors  was  k  =  15,30  or  60. 
The  dimensionality  d  was  chosen  to  be  a  multiple  of  5  between  15  and  200.  The  number  of 
iterations  of  the  algorithm  was  either  T  =  1  or  T  =  10.  The  supercharging  step  was  either 
skipped  or  performed  once  after  T  iterations. 

Most  of  the  approximations  have  been  computed  with  relative  error  up  to  2%.  The 
results  are  shown  in  Figures  2,  3  and  in  Table  1.  In  Figures  2,  3  (left),  we  plot  ratio,smp; 
(see  (47))  as  a  function  of  the  dimensionality  d  for  k  =  15  and  k  =  60,  respectively. 
In  Figures  2,  3  (right),  we  plot  E smpi  [propaZgo]  as  a  function  of  the  dimensionality  d  for 
k  =  15  and  k  =  60,  respectively.  Each  figure  contains  four  curves,  corresponding  to  one 
iteration  without  supercharging,  one  iteration  with  supercharging,  ten  iterations  without 
supercharging  and  ten  iterations  with  supercharging. 

Table  1  has  the  following  structure.  The  first  column  contains  the  dimensionality  d.  The 
next  three  columns  contain  the  CPU  time  of  10  iterations  of  RANN  without  supercharging, 
with  the  requested  number  of  nearest  neighbors  being  k  =  15, 30,  60,  respectively.  The 
last  three  columns  contain  the  CPU  time  of  the  supercharging  only  (after  10  iterations  of 
RANN),  with  the  requested  number  of  nearest  neighbors  being  k  =  15,30,60,  respectively. 
The  CPU  time  is  shown  in  seconds. 

Several  observations  can  be  made  from  Table  1  and  Figures  2,  3  and  from  more  detailed 
experiments  by  the  authors.  In  these  observations,  we  refer  to  ratiosmp;  as  ’’ratio”,  and  to 
E smpi  [propa/go]  as  ’’proportion”.  We  recall  that,  roughly  speaking,  the  proportion  measures 
how  many  of  the  true  nearest  neighbors  have  been  found  by  RANN.  On  the  other  hand,  the 
ratio  measures  how  much  average  distances  to  suspects  differ  from  the  average  distances  to 
true  nearest  neighbors. 


1.  As  expected,  for  a  fixed  d  the  performance  of  RANN  improves  as  the  number  T  of 
iterations  increases,  both  in  terms  of  ratio  and  proportion. 

2.  As  expected,  for  a  fixed  d  the  performance  of  RANN  improves  if  supercharging  is 
performed,  both  in  terms  of  ratio  and  proportion. 

3.  For  a  fixed  d,  the  performance  of  RANN  improves  as  the  number  of  requested  nearest 
neighbors  k  increases,  both  in  terms  of  ratio  and  proportion  (at  the  expense  of  running 
time) . 
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4.  For  a  fixed  d,  the  effects  of  supercharging  (especially  on  proportion)  increase  as  k 
grows.  For  example,  in  Figure  2,  right  (k  =  15)  we  observe,  that,  for  T  =  10  and 
d  =  60,  supercharging  increases  the  proportion  from  22%  to  32%.  On  the  other  hand, 
in  Figure  3,  right  ( k  =  60)  we  observe  that,  for  T  =  10  and  d  =  60,  supercharging 
increases  the  proportion  from  43%  to  74%. 

5.  As  expected,  the  performance  of  RANN  slowly  deteriorates  in  terms  of  proportion,  as 
d  increases.  On  the  other  hand,  there  is  no  significant  deterioration  of  performance 
in  terms  of  ratio,  as  d  increases. 

6.  For  T  =  10  the  ratio  is  always  below  1.1,  with  or  without  supercharging. 

7.  Even  for  as  high  a  dimension  as  d  =  60,  as  few  as  ten  iterations  with  supercharging 
correctly  determine  at  least  30%  of  the  true  nearest  neighbors.  Moreover,  the  error  of 
this  detection  decays  exponentially  with  number  of  iterations  T. 

8.  The  running  time  of  RANN,  with  or  without  supercharging,  grows  roughly  linearly 
with  dimensionality  d,  as  expected  from  (23)  (see  Table  1). 

9.  For  fixed  dimensionality  d ,  the  running  time  of  RANN  without  supercharging  grows 
roughly  linearly  with  the  requested  number  of  nearest  neighbors  k,  as  expected  from 
(23)  (see  Table  1).  For  example,  10  iterations  of  RANN  in  the  case  of  d  =  200  take 
about  80,  124  and  208  seconds  for  k  =  15, 30,  60,  respectively. 

10.  For  fixed  dimensionality  d,  the  running  time  of  the  supercharging  step  grows  roughly 
quadratically  with  the  requested  number  of  nearest  neighbors  k ,  as  expected  from 
(23)  (see  Table  1).  For  example,  in  the  case  of  d  =  200  supercharging  takes  about  15, 
57  and  235  seconds  for  k  =  15, 30,  60,  respectively. 

11.  The  algorithm  has  been  tested  on  sets  of  points  having  non-Gaussian  distribution,  e.g. 
the  uniform  distribution  in  the  d—  dimensional  cube  [0,  l]d  or  Hamming  distribution 
(i.e.  uniform  distribution  on  the  discrete  set  of  the  vertices  of  [0,  l]rf).  For  both  uniform 
and  Hamming  distributions,  the  performance  of  the  algorithm  was  very  similar  to  that 
in  the  Gaussian  case  (see  [10]  for  details). 

12.  The  algorithm  has  been  tested  on  sets  of  points,  having  Gaussian  distribution  whose 
covariance  matrix  is  not  the  identity  matrix.  These  tests  seem  to  indicate  that  the 
performance  of  RANN  improves  as  the  condition  number  of  the  covariance  matrix 
grows.  As  an  extreme  example,  if  the  points  belong  to  a  q— dimensional  linear  subspace 
of  M.d,  the  performance  of  the  algorithm  does  not  depend  on  d  (though  the  running 
time  obviously  does). 
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20 
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0.43306E+01 
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30 
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50 
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0.97832E+02 

100 
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0.48810E+02 
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0.12427E+03 
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0.14595E+02 
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Figure  1:  Number  of  points:  N  =  30  ■  210  (blue  dots),  30  •  215  (red  squares),  30  •  220  (black 
circles),  30  •  225  (green  pluses). 

k  =  15,  N  =  122,880  k=  15,  N  =  122,880 


Figure  2:  Without  supercharging:  one  iteration  (blue  dots),  ten  iterations  (black  cirlces). 
With  supercharging:  one  iteration  (red  squares),  ten  iterations  (green  pluses). 

k  =  60,  N  =  1 22,880  k  =  60,  N  =  1 22,880 


Figure  3:  Without  supercharging:  one  iteration  (blue  dots),  ten  iterations  (black  cirlces). 
With  supercharging:  one  iteration  (red  squares),  ten  iterations  (green  pluses). 
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